Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean
نویسندگان
چکیده
In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually su ers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and relative distance to the other phrase breaks. The decision-tree based error correction was tightly integrated to overcome these limitations. The initially phrase break tagged morpheme sequence is corrected with the error correcting decision tree which was induced by C4.5 from the correctly tagged corpus with the output of the probabilistic predictor. The decision tree-based post error correction provided improved results even with the phrase break predictor that has poor initial performance. Moreover, the system can be exibly tuned to new corpus without massive retraining.
منابع مشابه
Statistical / Rule - based Hybrid Phrase Break
In this paper, we present a new phrase break detection architecture that integrates proba-bilistic approach with rule-based error correction. The architecture consists of a probabilis-tic phrase break detector and a transformational rule-based post error corrector. The probabilistic method alone usually suuers from performance degradation due to inherent data sparseness problems. So we adopted ...
متن کاملChinese prosody phrase break prediction based on maximum entropy model
A maximum entropy based model for prosody phrase break prediction was proposed in this paper, and a comparison was conducted on large corpora between the new model and the decision tree based model which was the mainstream method for prosody phrase break prediction. The contribution of lexical information and influences of different cutoff values were also investigated. It was demonstrated that...
متن کاملIncorporating second-order information into two-step major phrase break prediction for Korean
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...
متن کاملIntonational phrase break prediction using decision tree and n-gram model
In the current study, we propose and evaluate a new method for automatic intonational phrase break prediction based on sequences of parts-of-speech and word junctures. The proposed method uses decision trees to estimate the probability of a word juncture type (break or non-break) given a finite length window of part-of-speech values, and uses an n-gram to model the word juncture sequence. Train...
متن کاملLearning methods and features for corpus-based phrase break prediction on Thai
This paper presents applications of five famous learning methods for Thai phrase break prediction. Phrase break prediction is particularly important for our Thai text-to-speech synthesizer (TTS), where input Thai text has no word and sentence boundary. The learning methods include a POS sequence model, CART, RIPPER, SLIPPER and neural network. Features proposed for the learning machines can be ...
متن کامل